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Abstract 

Background: 3D domain swapping is a novel structural phenomenon observed in diverse set of protein structures 
in oligomeric conformations. A distinct structural feature, where structural segments in a protein dimer or higher 
oligomer were shared between two or more chains of a protein structure, characterizes 3D domain swapping. 3D 
domain swapping was observed as a key mediator of numerous functional mechanisms and play pathogenic role 
in various diseases including conformational diseases like amyloidosis, Alzheimer's disease, Parkinson's disease and 
prion diseases. We report the first study with a focus on identifying functional classes, pathways and diseases 
mediated by 3D domain swapping in the human proteome. 

Methods: We used a panel of four enrichment tools with two different ontologies and two annotations database 
to derive biological and clinical relevant information associated with 3D domain swapping. Protein domain 
enrichment analysis followed by Gene Ontology (GO) term enrichment analysis revealed the functional repertoire 
of proteins involved in swapping. Pathway analysis using KEGG annotations revealed diverse pathway associations 
of human proteins involved in 3D domain swapping. Disease Ontology was used to find statistically significant 
associations with proteins in swapped conformation and various disease categories (P-value < 0.05). 

Results: We report meta-analysis results of a literature-curated dataset of human gene products involved in 3D 
domain swapping and discuss new insights about the functional repertoire, pathway associations and disease 
implications of proteins involved in 3D domain swapping. 

Conclusions: Our integrated bioinformatics pipeline comprising of four different enrichment tools, two ontologies 
and two annotations revealed new insights into the functional and disease correlations with 3D domain swapping. 
GO term enrichment were used to infer terms associated with three different GO categories. Protein domain 
enrichment was used to identify conserved domains enriched in swapped proteins. Pathway enrichment analysis 
using KEGG annotations revealed that proteins with swapped conformations are present in all six classes of KEGG 
BRITE hierarchy and significantly enriched KEGG pathways were observed in five classes. Five major classes of 
disease were found to be associated with 3D domain swapping using functional disease ontology based 
enrichment analysis. Five classes of human diseases: cancer, diseases of the respiratory or pulmonary system, 
degenerative diseases of the central nervous system, vascular disease and encephalitis were found to be significant. 
In conclusion, our study shows that bioinformatics based analytical approaches using curated data can enhance 
the understanding of functional and disease implications of 3D domain swapping. 

Keywords: Protein aggregation, Human disease, Deposition disease, Human proteome, Data integration, Biological 
data mining 
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Background 

Computationally efficient classification, annotation and 
prediction algorithms are rapidly improving our under- 
standing of protein sequence-structure-function rela- 
tionships. Analysis of such relationships often helps in 
our understanding of novel sequence or structural fea- 
tures in the regulation of a particular function includ- 
ing molecular pathways and various disease 
mechanisms. Cells attain its functional integrity with 
the help of molecular mechanisms including protein- 
protein interactions [1-7]. Protein folding and subse- 
quent oligomerization of protein chains help such 
interactions in cellular environment. Protein-protein 
interactions play a key role in mediating higher order 
oligomerization. Protein-protein interactions are 
diverse in nature and they can be broadly classified, as 
transient interactions where the interactions are weak 
and obligatory interactions that are permanent in nat- 
ure. Based on sequence homology, two proteins with 
high degree of similarity could interact and form a 
homodimer, where as two distantly related proteins 
could form a heterodimer [8,9]. 3D domain swapping 
is a unique protein structural mechanism observed in 
homodimers or higher order oligomers with a specific 
type of interaction, where a segment of two protein 
chains are mutually swapped. 3D domain swapping 
was also observed in protein structures in heteroligo- 
mer conformations. 3D domain swapping was asso- 
ciated with several proteins that were involved in 
diverse functional events and disease pathways. Pre- 
vious studies on 3D domain swapping using structural 
properties indicated that 3D domain swapping share 
similar structural features of oligomeric protein com- 
plexes and primarily associated with deposition dis- 
eases [10-13]. Prior studies on 3D domain swapping 
were focused on small set of proteins largely due to 
the unavailability of a curated database of proteins 
involved in 3D domain swapping. In this study, we 
present results from analysis of proteins in the human 
genome and curated in 3DSwap knowledgebase using 
multiple biological enrichment methods. 3DSwap is the 
first database that catalogued proteins involved in 3D 
domain swapping. The database was developed using a 
literature-based protein structural curation strategy 
that utilized manual curation and a structural bioinfor- 
matics pipeline to gather data pertaining to 3D domain 
swapping. We used complete set of human proteins 
from 3DSwap database and examined statistically sig- 
nificant domains, biological process, cellular compo- 
nent, molecular function, biological pathways and 
diseases using enrichment methods. From a bioinfor- 
matics perspective, this manuscript is a case study that 
leverage application of robust bioinformatics methods 



to gain new functional and therapeutic insights from a 
protein structural mechanism. 

3D domain swapping: Pathophysiological basis of 
deposition diseases 

3D domain swapping is a unique protein structural phe- 
nomenon with implications in function, form and dis- 
ease (Figure 1). Only two scenarios (domain swapped 
dimer and open-ended oligomeric swapping) of 3D 
domain swapping are provided in the figure. Other sce- 
narios like double domain swapping, cyclic swapping 
and entirely swapped structures were observed in pro- 
teins with swapped oligomeric architecture. Protein 
structures involved in 3D domain swapping is character- 
ized by hinge regions and swapped regions. 3D domain 
swapping is associated with mutual swapping of a struc- 
tural segment between two or more chains in a protein 
oligomer. This mechanism was observed in a diverse 
group of proteins that mediate different structural, func- 
tional and physiological mechanisms. 3D domain swap- 
ping was primarily defined as a mechanism for 
functional or structural oligomeric assembly, recently 
defined as the molecular mechanism behind protein 
aggregation and thus implicated as a pathogenic basis of 
diseases like deposition diseases or conformational dis- 
eases [14], amyloidosis [15], serpinopathies [16] and pro- 
teinopathies [16]. Proteins involved in such diseases 
have higher aggregation propensities and involved in the 
formation of highly specific aggregates of a single pro- 
tein. From a structural perspective, some of these aggre- 
gates were generated by 3D domain swapping 
mechanism [12-14,17-33]. From a clinical perspective, 
such diverse disease manifestations mediated by this sin- 
gle structural mechanism are of great interest. It still 
remains elusive whether 3D domain swapping is exclu- 
sively associated with such conformational diseases or 
they may also play a crucial role in mediating complex 
diseases. 

Dataset of human proteins involved in domain swapping 

Irrespective of numerous biochemical and computa- 
tional studies focused on the molecular basis of 3D 
domain swapping [11,34-52], a detailed account of func- 
tional repertoire, including protein domains, Gene 
Ontology (GO) terms, biological pathways and disease 
associated with proteins in swapped conformation, were 
not reported. The mechanism of 3D domain swapping 
was reported in different evolutionary lineages and 
structures in swapped conformation were identified in 
multiple organisms with a large proportion character- 
ized from eukaryotes. Hitherto, proteome-wide analysis 
of this unique structural mechanism was impossible due 
to the non-availability of proteome level curated dataset. 
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Figure 1 Schematic representation of 3D domain swapping 
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Recently, we integrated in-depth literature curation and 
structural bioinformatics analytics to curate proteins 
involved in 3D domain swapping from Protein Data 
Bank (PDB) and reported a knowledgebase of proteins 
involved in 3D domain swapping [53]. 3DSwap offers a 
compendium of 293 protein structures with delineated 
hinge regions, swapped regions and offers an ideal 
resource to study functional and structural implications 
of domain swapping. 

Inference from biological and biomedical ontologies 
using enrichment analysis 

Enrichment analysis plays an important role in knowl- 
edge-based bioinformatics approaches [54,55]. In this 
study, enrichment analysis was performed using annota- 
tions derived from Pfam domains [56], GO [57-59], 
KEGG pathways [60] and Disease Ontology (DO) 
[61,62]. Enrichment analysis in bioinformatics is a col- 
lective term referring to a group of statistical bioinfor- 
matics algorithms developed to understand the global 
trends of a subset of genes or gene products compared 



to a background population (for example, all genes in 
the human genome and whole proteins encoded in the 
entire human genome or all genes tested in a given 
experiment or genes included in gene expression plat- 
forms etc.). Huang et al. [54] suggested a nomenclature 
to classify enrichment tools in bioinformatics as singular 
enrichment analysis (SEA), gene set enrichment analysis 
(GSEA) [63] and modular enrichment analysis (MEA) 
[55]. Fundamental differences between these three 
classes of algorithms arise in the manner by which the 
enrichment P-value was calculated. In SEA-based 
approach, annotation terms of subset of genes were 
assessed one at a time against a list of background 
genes. An enrichment /?-value was calculated by com- 
paring the observed frequency of an annotation term 
with the frequency expected by chance and individual 
terms beyond the j5-value cut-off (P-value < 0.05). 
BiNGO [64], FunctAssociate [65], Onto-express [66,67] 
are examples of SEA-based enrichment analysis tools. 
GSEA approaches are similar, but consider all genes 
during the enrichment analysis, instead of a pre-defined 
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threshold based genes, as in SEA approach. For exam- 
ple, Gene Ontology terms are connected by relation- 
ships and MEA based programs like Ontologizer [68] 
and topGO [69] employ the relationships that exist 
between the annotations. These programs were reported 
to attain better sensitivity and specificity due to the con- 
sideration of GO term relationships. GSEA is an enrich- 
ment-based computational method to determine 
whether an a priori defined set of genes show statisti- 
cally significant differences, when compared between 
two biological states [63]. For example, a set of human 
genes differentially regulated in a gene expression of 
analysis for a particular type of cancer can be considered 
as a prior gene list, and the background can be defined 
one or more datasets compiled in Molecular Signatures 
Database (MSigDB) [70] . A variety of tools are currently 
available for the functional enrichment analysis, a recent 
review cited 69 tools for such analysis and the list of 
tools are rapidly growing. Majority of these tools employ 
statistical methods using Fisher's test [71,72], hypergeo- 
metric function [64], binomial test [72] or ft 2 tests [73] 
or combination of such methods as implemented in 
tools like GFINDER [74] and Onto-Express [66,67] for 
significant association of the GO terms and the gene list 
with respect to the background distribution. Concept of 
gene set enrichment analysis was incorporated in to var- 
ious programs that use biological or functional annota- 
tions of genes and gene products to perform biological 
enrichment calculations using ontologies and annota- 
tions. Gene Ontology enrichment and pathway enrich- 
ment analysis employ similar conceptual and statistical 
methods to understand functional and molecular roles 
of subset of genes or proteins were found to be very 
efficient in summarizing functional diversity or similarity 
trends. Such approaches are routinely employed in gene 
expression studies, high-throughput screening experi- 
ments and genome-wide association studies (GWAS) 
[75,76]. 

Gene ontology enrichment and pathway enrichment 
analysis, using ontologies or annotations derived from a 
subset of genes characterized from an experimental or 
computational study, generally applied to infer new bio- 
logical insights, which was otherwise impossible with 
candidate gene-centric approaches. Due to the generic 
nature of statistical methods used in enrichment analy- 
sis, current set of enrichment algorithms and related 
statistical methods can be used to infer enrichment 
from annotation databases. Enrichment calculations are 
currently available for various types of annotations. 
Annotations of protein domains (Pfam [56], SMART 
[77]), pathways (KEGG [60], GenMAPP[78]) and human 
gene-disease associations using Online Mendelian 
Inheritance in Man (OMIM) [79] are currently used for 
enrichment analysis. Similar to GO, any ontology (for 



example: disease ontology (DO) [62]) maintained by 
Open Biological and Biomedical Ontologies (OBO) [80] 
foundry or its mapping or derivatives (for example: dis- 
ease-ontology (DOLite) [61]) can be effectively used for 
enrichment analysis. 

Enrichment tools, ontologies, annotation databases and 
statistical methods 

This study utilized four tools, two ontologies and two 
annotation databases for inferring functional and dis- 
ease insights from list of human proteins involved in 3D 
domain swapping. Protein domain enrichment was per- 
formed using DAVID 6.7. Protein domain annotations 
were derived from Pfam database, a database of evolu- 
tionarily conserved protein domain coordinates. Ontolo- 
gizer 2.0, a GO term enrichment tool with command- 
line interface and improved statistical method for deriv- 
ing GO terms enriched in a given list of proteins was 
used in this study. SubPathwayMiner, an R package that 
internally handles KEGG annotations for pathway 
enrichment analysis were used to derive statistically sig- 
nificant pathways associated with the dataset. Enriched 
disease ontology terms were identified using Functional 
Disease Ontology server that consults Disease Ontology 
and it's derivative disease-ontology lite for identifying 
significant diseases. H 0 = List of curated proteins with 
swapped conformations are not associated with any 
class of protein domains, gene ontology terms, KEGG 
pathways or disease ontology terms. We tested our null 
hypothesis individually using four different tools and 
associated annotations or ontologies. P-value from 
enrichment analyses were obtained using default statisti- 
cal settings of different tools employed in this study. 
Protein domain enrichment P-values were derived from 
DAVID using a modified Fisher Exact P-value, called 
EASE score [81]. GO term enrichment analysis P-values 
were derived using Ontologizer 2.0 and corrected using 
Bonferroni method [68]. KEGG pathway enrichment 
using SubPathwayMiner, it provides False Discovery 
Rate (FDR) corrected P-values. Disease enrichment ana- 
lysis was performed using Functional Disease Ontology 
server and it uses a Fisher's exact test for deriving P- 
values. 

Methods 

Curated dataset of human proteins involved in 3D 
domain swapping 

Classification of proteins in 3DSwap knowledgebase 
based on SOURCE record from PDB and subsequent 
mapping using SIFTS annotations revealed that 75 
structures out of 293 structures reported in 3DSwap 
were from Homo sapiens. A cursory look at 3DSwap 
database for the taxonomic spread would indicate that 
the largest fraction was from humans (25.6%) (Figure 2). 
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Figure 2 Taxonomic (a) and species (b) level distribution of proteins in swapped conformation from 3DSwap knowledgebase 



We used literature-curated structures from 3DSwap 
database with delineated 'hinge' and 'swapped' regions 
for the analysis in (see Additional file 1: Supplementary 
Table 1 for list of proteins used in this study). 75 PDB 
identifiers were mapped to UNIPROT and KEGG data- 
base identifiers using Protein ID cross-reference (PICR) 
service and custom Perl scripts [82]. Out of the 75 
curated protein structures with 3D domain conforma- 
tion retrieved from 3DSwap knowledgebase, 45 proteins 
were unique (See Table 1). Human proteins from our 
curated dataset had several redundant structures. To 
avoid potential functional bias, only unique human 



Table 1 Enriched Pfam domains associated with proteins 
involved in 3D domain swapping 



Pfam identifier 


Pfam Description 


P-value 


PF07714 


Protein tyrosine kinase 


3.0E-6 


PF00031 


Cystatin domain 


1 .1 E-5 


PF01463 


Leucine rich repeat C-terminal domain 


1 .9E-3 


PF00625 


Guanylate kinase 


3.3E-4 


PF07679 


Immunoglobulin l-set domain 


6.6E-3 



proteins (45/75 structures) were used in this analysis. 
Graphical summary of the bioinformatics pipeline 
employed in this study is depicted in Figure 3. 

Enrichment analysis of human proteins involved in 3D 
domain swapping 

Protein domain enrichment analysis was performed 
using DAVID [81]. KEGG pathway analysis was per- 
formed using SubPathwayMiner [83] and Disease Ontol- 
ogy analysis was performed using Functional Disease 
Ontology server [61,62]. 

Protein domain enrichment analysis 

To perform protein domain enrichment analysis, 
domains were identified in proteins involved in 3D 
domain swapping and a list of protein domains was 
obtained. This list of protein domains was compared 
against a reference dataset of protein domains associated 
with complete human proteome. Protein domain enrich- 
ment analysis was performed to understand statistically 
significant, conserved, functional modules associated 
with proteins involved in 3D domain swapping. Dataset 
of 45 Uniprot identifiers were used for protein domain 
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Figure 3 Bioinformatics pipeline employed to derive functional, pathway and diseases associations of proteins involved in 3D domain 
swapping 



enrichment analysis using Pfam annotations. DAVID 
version 6.7 with default settings was used for the 
analysis. 

Gene ontology enrichment analysis 

GO term enrichment analysis in this study was per- 
formed using Ontologizer 2.0, a multifunctional tool for 
GO term enrichment analysis. Ontologizer was selected 
due to the improved statistical approximation methods 
incorporated in it. A brief description of the method is 
provided here. Generic GO enrichment tools calculate 
the enrichment of a GO term with respect to the list of 
genes in the dataset and the background population 
using the probability of drawing the same or higher 
number of genes annotated to a given term. This basic 
concept was implemented using statistical test involving 
the upper tail of the hypergeometric distribution or one- 
tailed Fisher's exact test. Such methods do not consider 
relationships between the annotation terms. GO is 
defined as a directed acyclic graph (DAG), with various 
levels of relationships between the terms. Due to DAG 
architecture of GO, a gene or gene product annotated 
with a term x is also annotated to all parent terms of x, 
and this often leads to false enrichment calculations. 
Such relationships (for example: is a, part of, has part, 
regulates) were taken into account in Ontologizer 2.0 
using parent-child inheritance concepts [84]. Detailed 
description about the statistical method implemented in 
the Ontologizer 2.0 can be found elsewhere [68,84]. 
Dataset consisting of 45 Uniprot identifiers were used 
for species {Homo sapiens) specific GO enrichment ana- 
lysis and pathway analysis. GO enrichment analysis was 
performed using the following parameters using Ontolo- 
gizer 2.0: Gene Ontology annotations were derived from 
human-specific annotation data (gene association. 



goahuman) [58], multiple testing correction was set to 
"Bonferroni correction" method, enrichment calculation 
was set to Parent-child-Intersection, re-sampling step 
was set to 1000. Gene Ontology was defined using 
33,738 terms and 59,508 relations recorded in the gen- 
e_ontology.obo file (downloaded on February 2011) 
were used for the analysis. Background population for 
statistical tests was defined using 18,257 proteins 
encoded in the human genome with Gene Ontology 
annotations. 

KEGG based pathway enrichment analysis of proteins in 
human proteome with swapped conformation 

Pathway enrichment analysis using KEGG pathway 
annotations were performed to understand the role of 
proteins in 3D domain swapping conformation in var- 
ious biological pathways. UNIPROT Identifiers were 
mapped to Entrez gene identifiers using custom Perl 
scripts and used as the input in R package SubPathway- 
Miner [83] for pathway enrichment analysis. Pathways 
associated with these proteins were obtained from 
KEGG pathway database and compared to a reference 
database of full list of proteins and its corresponding 
pathways annotated in KEGG databases. 

Disease enrichment analysis of proteins in swapped 
conformation using disease ontology 

The disease ontology tern enrichment analysis was per- 
formed using Functional Disease Ontology server [62]. 
List of 45 human genes mapped to UNIPROT Identi- 
fiers were mapped to Entrez gene identifiers using cus- 
tom Perl scripts. List of Entrez identifiers were used as 
input for Disease Ontology enrichment to understand 
the role of the human proteins with swapped conforma- 
tion in various biological pathways. Out of 45 genes in 
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the list, 35 were found to be associated with at least one 
disease. Briefly, the disease association of each gene in 
the human genome was annotated using the Disease 
Ontology and peer-reviewed evidence from Gene 
Related Information into Function (GeneRIF) [61,62,85]. 
A condensed version of the Disease Ontology, Disease 
Ontology Lite [61], was used for the statistical analysis. 
Similar to Gene Ontology analysis, the significance of 
each disease association was evaluated using Fisher's 
exact test. 

Results 

3D domain swapping is a structural mechanism 
employed by a variety of protein structures to form oli- 
gomeric assemblies. These oligomers were often asso- 
ciated with aggregation diseases or proteinopathies in 
humans. Parkinson's diseases and Alzheimer's diseases 
are two major neurodegenerative diseases due to pheno- 
typic impact of 3D domain swapping. Hitherto, no com- 
prehensive study has been reported to analyze the 
impact of all proteins involved in 3D domain swapping 
from a whole proteome-wide or genome-wide perspec- 
tive due to unavailability of a well-defined, curated data- 
set. We performed the initial investigation of proteins 
involved in 3D domain swapping in the level of protein 
domains, Gene Ontology, KEGG pathways and Disease 
Ontology. Our approach helped to understand enriched 
protein domains, Gene Ontology terms, biological path- 
ways and Disease Ontology terms mediated by these 
proteins and their role in mediating various human 
diseases. 

Statistically significant protein domains associated with 
swapped proteins in the human proteome is provided 
(Table 1), GO terms (Tables 2, 3, 4), KEGG pathways 
(Table 5) and DO terms (Table 6), associated with 
swapped proteins encoded in the human proteome, are 
provided. Critical aspects of statistically significant evo- 
lutionarily conserved domains, GO terms, KEGG path- 
ways and DO terms associated with human proteins in 
swapped conformation are summarized in the 'Discus- 
sion' section. 

Proteins involved in 3D-domain swapping represents a 
large collection of proteins with a variety of functional 
and regulatory roles in the cell. Due to limitation in 
crystallizing structures in the swapped conformation, 
currently available repertoire of proteins in the swapped 
conformation may represent only a small fraction of 
proteins that may perform its molecular role via 3D 
domain swapping. Machine learning algorithms and 
computational approaches may help to predict more 
proteins with features of 3D domain swapping [11,52]. 
Here we discuss primary insights obtained from the 
initial investigation of proteins involved in 3D domain 
swapping. Present results from the human proteome 



Table 2 Statistically significant Biological Process terms 
from GO term enrichment analysis 



GO ID 


GO term 


P- 

value 


GO:0048518 


Positive regu ation of biological process 


0.002 


GQ0016032 


Viral reproduction 


0.002 


GO:0048519 


Negative regulation of biologica process 


0.005 


GQ0009987 


Cel ular process 


0.006 


GO:0040007 


Growth 


0.008 


GO:0018126 


Protein amino acid hydroxylation 


0.008 


GO:0032501 


Multicellular organismal process 


0.009 


GO:0035110 


Leg morphogenesis 


0.01 


GO:0007154 


Gell communication 


0.01 


GO:0016271 


Tissue death 


0.011 


GO:0051704 


Multi-organism process 


0.014 


GO:0090046 


Regulation of transcription regulator activity 


0.014 


GO:0050896 


Response to stimulus 


0.015 


GO:0044403 


Symbiosis, encompassing mutualism through 
parasitism 


0.015 


GO:0001775 


Ce I activation 


0.016 


GO:0065007 


Biologica regulation 


0.017 


GO:0023052 


Signa ing 


0.019 


GO:0032502 


Developmenta process 


0.021 


GO:0034465 


Response to carbon monoxide 


0.021 


GO:0014071 


Response to cycloalkane 


0.023 


GO:0006793 


Phosphorus metabolic process 


0.023 


GO:0051098 


Regulation of binding 


0.026 


GO:0000003 


Reproduction 


0.032 


GO:0045342 


MHC class II biosynthetic process 


0.033 


GO:0001816 


Cytokine production 


0.037 


GO:0008356 


Asymmetric cell division 


0.037 


GO:004641 7 


Chorismate metabolic process 


0.038 


GO:0030431 


Sleep 


0.038 


GO:0048610 


Reproductive cellular process 


0.039 


GO:0007610 


Behaviour 


0.043 



indicates an important paradigm that future drug design 
studies, focusing on various disease categories or path- 
ways associated with 3D domain swapping, should con- 
sider the structural implications of this important 



Table 3 Statistically significant Cellular Component terms 
from GO term enrichment analysis 



GO ID 


GO term 


P-value 


GQ0005802 


Trans-Golgi network 


0.002 


GO:0071944 


Cell periphery 


0.004 


GO:0005737 


Cytoplasm 


0.009 


GO:0045121 


Membrane raft 


0.024 


GO:0048786 


Presynaptic active zone 


0.05 
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Table 4 Statistically significant Molecular Function terms 
from GO term enrichment analysis 



GO ID 


GO term 


P-value 


GQ0060089 


Molecu ar transducer activity 


0.008 


GO:0003682 


Chromatin binding 


0.008 


GQ0042802 


Identica protein binding 


0.01 1 


GO:00 19838 


Growth factor binding 


0.011 


GO:0046983 


Protein dimerization activity 


0.011 


GO:0004713 


Protein tyrosine kinase activity 


0.013 


GO:0019144 


ADP-sugar diphosphatase activity 


0.02 


GO:0004883 


Glucocorticoid receptor activity 


0.023 


GO:0030545 


Receptor regulator activity 


0.035 


GO:0050998 


Nitric-oxide synthase binding 


0.047 


GO:0001871 


Pattern binding 


0.048 


GO:0070851 


Growth factor receptor binding 


0.049 



structural mechanism and associated mechanisms like 
macromolecular crowding and protein aggregation. 

Functional repertoire of proteins involved in 3D domain 
swapping 

Protein domain enrichment analysis reveals that five 
protein domain families were enriched in the dataset 
(See Table 1). These include protein tyrosine kinase 
domain, a member of kinase domain family involved in 
signal transduction [86], cystatin domain, a member of 
cysteine protease inhibitor family [87], leucine-rich 
repeat C-terminal domain, an unique motif that med- 
iates protein-protein interaction [88], Guanylate kinase, 
a key mediator of catalytic reaction that converts adeno- 
sine triphosphate (ATP) to adenosine diphosphate 
(ADP) and adenosine monophosphate (AMP) [89] and 
Immunoglobulin I-set domain found in several cell 
adhesion molecules [90]. We noted that significantly 
enriched conserved protein domains associated with 3D 
domain swapping plays pivotal role in various signaling 
pathways, thus it also points the role of domain swap- 
ping in multiple signal transduction events. 

Statistically significant GO terms associated with swapped 
proteins 

GO term enrichment analysis revealed that multiple 
terms in three different GO categories were associated 
with swapped proteins encoded in the human proteome. 
This includes 31 GO terms in biological process cate- 
gory (Table 2), five GO terms in cellular component 
category (Table 3) and 12 terms in molecular function 
category (Table 4). DAG structure with highlighted GO 
terms in biological process (Additional file 2: Figure SI), 
cellular compartment (Figure 4) and molecular function 
(Additional file 3: Figure S2) categories are provided. 
Biological process contains several non-specific and 



specific GO terms that point towards functional under- 
standing of the proteins involved in 3D domain swap- 
ping. Top "Biological Process" terms include viral 
reproduction and protein amino acid hydroxylation. 
Two cellular transport related terms under "Cellular 
Component" category (membrane raft and trans-Golgi 
network), along with cytoplasm and cell periphery, were 
also found to be associated with human proteins 
involved in 3D domain swapping. Enriched molecular 
function terms indicate that human proteins involved 
3D domain swapping is involved in multiple signaling 
and binding activities including chromatin binding, pro- 
tein kinase activity and protein dimerization activity. 
This also indicates specific role of proteins involved in 
swapping and its association with mechanisms like oli- 
gomerization, macromolecular crowding and aggregation 
which are considered to be cellular mechanisms impli- 
cated by 3D domain swapping. GO term enrichment 
analysis provided a cursory view of biological processes, 
cellular components and molecular functions associated 
with 3D domain swapping. 

Implications of 3D domain swapping in in biochemical 
pathways 

Results from pathway enrichment analysis using Bio- 
Conductor based SubPathwayMiner package indicates 
that proteins in swapped conformation participate in 
multiple biological pathways. Results from pathway 
enrichment analysis using KEGG annotations are pro- 
vided in Table 5. KEGG database classifies the pathways 
using a top-level functional hierarchy classification using 
KEGG-BRITE hierarchy. According to this hierarchy, 
human pathways were classified into six categories 
(Metabolism, Genetic Information Processing, Cellular 
Processes, Organismal Systems and Human diseases). 
Current analysis reveals that proteins with 3Dswap con- 
formations are present in all six classes, but significantly 
enriched KEGG pathways were observed in all classes 
except the Genetic Information Processing. Proteins 
involved in 3D domain swapping are observed in multi- 
ple subcategories of KEGG pathway hierarchy (see Fig- 
ure 5). KEGG pathway analysis indicated that proteins 
in the swapped conformation are statistically significant 
in four subclasses of human disease class viz. Cancers, 
Immune System Diseases, Infectious Diseases and Neu- 
rodegenerative Diseases. Proteins are also involved in 
other subclasses of diseases like Cardiovascular Diseases 
of KEGG BRITE hierarchy (See Table 5). 

Disease implications of proteins involved in 3D domain 
swapping 

Since KEGG pathways represent biochemical pathways 
and disease pathways in a single framework, a further 
detailed analysis of human proteins in swapped 
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Table 5 KEGG pathways associated with proteins involved in 3D domain swapping in the dataset. 



KEGG Pathway 
ID 


Pathway Name 


P- 

value 


KEGG BRITE class 


hsa05200 


Pathways in cancer 


0.000 


Human Diseases; Cancers 


hsa04722 


Neurotrophin signaling pathway 


0.000 


Organismal Systems; Nervous System 


hsa05144 


Malaria 


0.000 


Human Diseases; Infectious Diseases 


hsa04630 


Jak-STAT signaling pathway 


0.000 


Environmental Information Processing; Signal Transduction 


hsa05120 


Epithelial cell signaling in Helicobacter pylori 
infection 


0.000 


Human Diseases; Infectious Diseases 


hsa05211 


Renal cell carcinoma 


0.000 


Human Diseases; Cancers 


hsa04510 


Focal adhesion 


0.001 


Cellular Processes; Cell Communication 


hsa04660 


T cell receptor signaling pathway 


0.001 


Organismal Systems; Immune System 


hsa05310 


Asthma 


0.002 


Human Diseases; Immune System Diseases 


hsa04060 


Cytokine-cytokine receptor interaction 


0.002 


Environmental Information Processing; Signaling Molecules and 
Interaction 


hsalbUzO 


Prion diseases 


0.002 


Human Diseases; Neurodegenerative Diseases 


nsaUbooU 


Allograft rejection 


0.003 


Human Diseases; Immune System Diseases 


hsauUozO 


Pyruvate metabo ism 


0.003 


Metabolism; Carbohydrate Metabolism 


hsa(J4o/z 


ntestinal immune network for IgA production 


0.005 


Organismal Systems; Immune System 




Autoimmune thyroid disease 


0.006 


Human Diseases; Immune System Diseases 


hsa05110 


Vibrio cholerae infection 


0.006 


Human Diseases; Infectious Diseases 


hsa05221 


Acute myeloid leukemia 


0.006 


Human Diseases; Cancers 


hsa04144 


Endocytosis 


0.008 


Cellular Processes; Transport and Catabolism 


hsa05218 


Melanoma 


0.009 


Human Diseases; Cancers 


hsa05100 


Bacterial invasion of epithelial cells 


0.009 


Human Diseases; Infectious Diseases 


hsa05220 


Chronic myeloid leukemia 


0.010 


Human Diseases; Cancers 


hsa04520 


Adherens junction 


0.010 


Cellular Processes; Cell Communication 


hsa00400 


Phenylalanine, tyrosine and tryptophan 
biosynthesis 


0.010 


Metabolism; Amino Acid Metabolism 


hsau4oo4 


Fc epsilon Rl signaling pathway 


0.01 2 


Organismal Systems; Immune System 


I liauJ ^ijiz. 


j-p I inn i-ani-pr 
Jl 1 Id 1 1 LCI IUIIU l_d 1 IL.CI 


0013 


Hi iman Plicpacnc- C ani-prc 
1 lUlllall L/ljCditrS, ^.d 1 IL.t:l i 


hsa04012 


ErbB signaling pathway 


0.014 


Environmental Information Processing; Signal Transduction 


hsa04210 


Apoptosis 


0.014 


Cellular Processes; Cell Growth and Death 


hsa04540 


Gap junction 


0.015 


Cellular Processes; Cell Communication 


hsa04010 


MAPK signaling pathway 


0.018 


Environmental Information Processing; Signal Transduction 


hsa05146 


Amoebiasis 


0.020 


Human Diseases; Infectious Diseases 


hsa04360 


Axon guidance 


0.029 


Organismal Systems; Development 


hsa04530 


Tight junction 


0.031 


Cellular Processes; Cell Communication 



Statistically significant associations are highlighted in bold 



conformation was performed using a dedicated ontology 
that defines human diseases. Functional disease ontology 
annotation tool that uses Disease Ontology-derived 
"Disease Ontology-lite" and GeneRIFs were used in this 
analysis due to the brevity of the terms and availability 
of significant gene-disease association data. Enrichment 
analysis using disease ontology provided a detailed over- 
view of the statistically significant association between 
gene-products in the swapped conformation with var- 
ious disease categories. Using the current subset of data, 
five major classes of diseases were observed in the 



disease Ontology-based enrichment analysis as follows: 
cancer (prostate cancer, thyroid cancer, breast cancer 
and neoplasm metastasis), diseases of the respiratory or 
pulmonary system (asthma, bronchial hyperreactivity, 
pulmonary alveolar proteinosis), degenerative diseases of 
the central nervous system (Amyotrophic lateral sclero- 
sis, Parkinson's Disease), vascular disease (atherosclero- 
sis, hypertension) and encephalitis (rabies). 
Neurodegenerative diseases are well-known to have 
strong association with 3D domain swapping, but 
insights into other diseases indicates that there could be 
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Table 6 Disease ontology terms associated with proteins 



in vui vcu iii ju uuiiiaiii 


9 Wfl U U 1 1 1 U ■ 




Ls\J 1 cl 111 




p_ 
value 


Acthma 

r\j L 1 II 1 la 


urn TIP1 RC\0\1 111 


0.001 


A m\/nti'nn h i c lateral 
rMiiy(JLi(J|Jiiii. la lci ai 

sclerosis 


MFT HCTN1 CST? 


0.001 


Bronchial hyperreactivity 


IL10, IL5 


0.001 


Pulmonary alveolar 
proteinosis 


MO, CST3 


0.001 


Dental plaque 


IL10, TJP1, BCL2L1 


0.002 


Prostate cancer 


IL10, NC0A2, GL01, SERPINC1, 

L_> 1 J 


0.002 


rally llvfcrl 


MFT 11 m 

IVIC 1 , IL IU 


n fin^ 

U.UUj 


Mlricl ObClclOblb 


FJnm 11 ICi FPI-tVO r"<JT? 


n ftft^ 


Rsbiss 


QMASF1 Rn 9/ 1 SFRPINCl 
nlMMJC 1, Di-LZL 1 , jErirllW- 1 


ft C\C\A 


rdrKinbOn Cllbcdbc 


II in FPNYJ Rn Jl 1 


n (\c\a 

U.uUt 


Thyroid cancEr 


// 7/1 TIP1 
ILIUf IJrl 


ft n"9^i 


INcOpidblTl iTlcld blab lb 


If 1H QNASF1 rsTi 
ILIUf tilvMjCI, \-J 1 J 


ft (\">A 


Hypertension 


If m FPNYJ f~ST? 
IL 1 Uf ErrlAZf l_ J 1 J 


ft ft9R 


Breast cancer 


is in mcdai r^TA r^Ti 

ILIU, l\\~\Jr\Z, CJ/rl, LjI J 


n n^ 


Luncj cancer 


ILIU, C.J/D 


U.U J / 


Adenovirus infection 


QTK1 RCI 11 1 
r 1 !\Z, DLLZL 1 


u.u/ z 


Abortion 


mdd 1 inn 

iWUU 1 , ILIU 


U.UOD 


Autistic disorder 


AAFT CI D1 




Kidney disease 


PTK1 SFRPINn 
r 1 t\Z, JCnr//VL / 


n i m 

U. I U I 


Kidney fai ure 


// in r^Ti 

IL 1 U, / J 


n 1 1& 

U. I ZO 


Enteritis 


mdd 1 inn 

IVLJL/ 1 , ILIU 


n 1 49 

U. 1 HZ 


Autoimmune disease 


is in rci ii i 

IL I U, DLL/L / 


U. I HO 


Systemic sc eroderma 


MFT ii in 

IVlLl, ILIU 


n 1 7^ 


Ulcerative colitis 


kinm ii t 

l\\JU I , ILJ 


n 1 r 

U. I O 


Multiple sclerosis 


\jL\J I , DLLZL 1 


U. 1 OH 


Infection 


i\inni ii in 

i\VJU I , ILIU 


u.zoo 


Uci I I Id II LIS 


CQTA II C 
V^jln., ILJ 


u.zy^ 


Cancer 


AAFT QTK1 FPNYl RCI 11 1 
IVlL 1 , r 1 r\Z, LrnAZ, DL^LZLI 


n iiq 


Lupus erythematosus 


IL10, PTK2 


0.378 


Melanoma 


ILW, TJPI 


0.41 


Alzheimer's disease 


IL10, CST3 


0.713 


Embryoma 


ILW, C5T3 


0.99 


Rheumatoid arthritis 


BCL2L ], CST3 


0.99 


Colon cancer 


NODI, TJPI 


0.99 


Leukemia 


ILW, NC0A2 


0.99 


Diabetes mellitus 


TJPl C5T3 


0.99 



Statistically significant associations are highlighted in bold 



more proteins with disease association and 3D domain 
swapping, beyond the currently well-known group of 
conformational diseases. Detailed table with Disease 
Ontology term (disease), genes associated with each dis- 
ease and P-value for the association is provided in Table 
6. Five of the significantly enriched diseases in the 



dataset and the genes associated with the diseases are 
provided as a network (Figure 6). Network is defined 
using genes as nodes and disease shared between the 
genes are considered as common edge between two 
genes. Disease ontology is useful to map disease rela- 
tionships across human genes and diseases. To expand 
this disease association to clinically relevant information, 
we curated the disease ontology terms associated with 
3D domain swapping to derive the associated Interna- 
tional Classification of Diseases - 9 (ICD-9) codes. Dis- 
eases under the following ICD-9 codes 001-139 
(infectious and parasitic diseases), 140-239: (neoplasms), 
320-359 (diseases of the nervous system), 390-459: dis- 
eases of the circulatory system, 460-519 (diseases of the 
respiratory system). This further helped to understand 
major classes of clinically relevant disease phenotypes 
mediated by a unique molecular mechanism. 

Discussion 

Domain swapping is a key pathophysiological mechanism 
mediating conformational disease. A detailed account of 
functional repertoire, molecular pathways and spectrum 
of diseases affected by this mechanism remains elusive. 
We used enrichment calculations to understand the 
aspects using a curated dataset of proteins involved in 3D 
domain swapping. Our analysis was performed using a 
dataset of 45 unique human proteins derived from 
3DSwap knowledgebase [53]. This dataset will be grow- 
ing in the future as structural characterization of human 
proteins involved in domain swapping is rapidly increas- 
ing. Numerous structures are being identified and more 
proteins with swapped conformation may found to be 
associated with domain swapping. Performing analysis 
using the approaches we employed in the future may 
help to identify additional protein domains, Gene Ontol- 
ogy terms, molecular pathways and human diseases. 

Due to oligomeric features of swapping, earlier studies 
have indicated that 3D domain swapping plays a crucial 
role in conformational diseases or deposition diseases 
and proteinopathies. There was limited insight on struc- 
ture-function relationship of proteins involved in 
domain swapping due to unavailability of a large dataset 
to objectively analyze functional or disease implications 
implicated by 3D domain swapping. Proteins encoded in 
the human genome and reported to be involved in 3D 
domain swapping were analyzed in detail to understand 
the role of gene products in various classes of diseases, 
beyond conformation diseases or proteinopathies. Map- 
ping and enrichment analysis of human proteins 
involved in 3D domain swapping to KEGG pathways in 
'disease' class and Disease Ontology indicates that these 
proteins play a significant role in various other diseases 
categories along with well-known neurodegenerative or 
conformational diseases. 
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Figure 4 Gene Ontology enrichment analysis (Cellular Component) using unique human proteins from the dataset. Colored nodes 
indicate enriched terms associated with proteins involved in 3D domain swapping. 



Availability of genome-scale sequence data and anno- 
tations were considered as the ideal resource for gaining 
new insights from a plethora of biological data. Struc- 
tural mechanisms can gain new insights about the func- 
tional aspects by mapping and database-wide 
enrichment analysis using annotations. In a similar way, 
functional mechanism may also gain new insight by 
using knowledge-based approaches employed in this 
study. In summary, the present study reports the appli- 
cation of knowledge-based approaches to understand 
new functional insights about a structural mechanism. 
Starting from an initial dataset of protein structures, the 



present study shows the importance and impact of the 
data integration and data mining to derive biologically 
relevant interpretations of global trends of a structural 
mechanism from sequence, functional and disease per- 
spective. Further new insights are obtained from a trans- 
lational perspective by focusing on proteins involved in 
3D domain swapping in the human genome. 3D domain 
swapping is a unique phenomenon and may affect avail- 
ability of active sites and binding sites required to 
impart the biological function depending on the 
swapped conformation. Perhaps, future drug design stu- 
dies should consider these important aspects while 
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Distribution by various classes of KEGG BRITE hierarchy 
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Figure 5 Comparison of KEGG BRITE hierarchies in KEGG database and proteins from the human dataset mapped to KEGG BRITE 
hierarchy. HD = Human Diseases, OS = Organismal Systems; CP = Cellular Processes; EIP = Environmental Information Processing; GIP = Genetic 
Information Processing and Met = Metabolism. 
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DCTN1 

Figure 6 Disease Ontology term - gene network derived from Disease Ontology enrichment analysis using human proteins involved in 
3D domain swapping 



developing therapeutics for various disease categories 
where 3D domain swapping is observed. 

Clinical relevance of 3D domain swapping 

In the current era of personal genomes and network 
medicine, clinical and therapeutic approaches are utiliz- 
ing integrated approaches for the understanding of dis- 
ease states and pathophysiological mechanisms. 
Complex disease states are often triggered by perturba- 
tions in multiple pathways by multiple genes [91-94]. 
Protein structures and structural mechanisms play an 
important role in the phenotypic impact of various dis- 
eases and signaling pathways [95-101]. Protein structural 
information is routinely utilized to identify drug targets 
that will help in development of effective drugs 
[102-104]. New approaches will be required to target 
proteins or biochemical pathways with proteins in the 
swapped conformation. Our study illustrates the applica- 
tion of biological and biomedical enrichment tools, 
ontologies and annotations to understand functional 
role and disease implications of an important structural 
mechanism from the global perspective of human 
proteome. 



Insights obtained from our disease ontology analysis 
indicates that 3D domain swapping is not just confined 
to neurodegenerative diseases, proteins in swapped con- 
formation play a significant role in several other classes 
of diseases like cancer, vascular disease, pulmonary dis- 
ease etc. Enrichment results discussed in this paper will 
be useful in such studies in the future from biochemical, 
functional, structural and therapeutic perspective. Our 
analysis also indicates that further genome-specific ana- 
lysis of proteins involved in 3D domain swapping, using 
comparative genome analysis framework, may also add 
further understanding of functional, structural and 
pathophysiological manifestations of 3D domain 
swapping. 

Conclusion 

3D domain swapping is an important structural 
mechanism associated with a diverse set of proteins 
involved in multitude of biological processes and mole- 
cular functions and diseases including proteinopathies. 
This phenomenon is often studied from the perspective 
of protein structure and its impact on biological path- 
ways, correlations with biological functions and 
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association with classes of diseases other conformational 
diseases were largely unknown. We performed a knowl- 
edge-based analysis of human proteins involved in 3D 
domain swapping to find the key functions, pathways 
and diseases associated with 3D domain swapping. Our 
study was limited to 45 unique proteins involved in 3D 
domain swapping. 3D domain swapping is a functionally 
relevant phenomenon due to its primary role in protein 
oligomerization; proteins with swapped oligomeric states 
are being identified on a regular basis using crystallogra- 
phy experiments. Effective algorithms that can predict 
swapping from structural and sequence information may 
also help to identify more proteins in swapped confir- 
mation. As more proteins are being characterized in 
swapped conformation, performing such knowledge- 
based analysis using new proteins, improved annotations 
and enhanced ontologies may reveal additional func- 
tional classes, pathways and disease. In summary, we 
showed results from an initial investigation to under- 
stand conserved protein domains, functional repertoire, 
pathways and diseases mediated by 3D domain swap- 
ping in human proteome. 
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